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Abstract 

An inequality of Brascamp and Lieb provides a bound on the covariance of two 
functions with respect to log-concave measures. The bound estimates the covariance 
by the product of the L 2 norms of the gradients of the functions, where the magnitude 
of the gradient is computed using an inner product given by the inverse Hessian 
matrix of the potential of the log-concave measure. Menz and Otto |13j proved a 
variant of this with the two L 2 norms replaced by L 1 and L°° norms, but only for 
R . We prove a generalization of both by extending these inequalities to L p and L q 
norms and on M. n , for any n > 1. We also prove an inequality for integrals of divided 
differences of functions in terms of integrals of their gradients. 

Mathematics subject classification number: 26D10 
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1 Introduction 

Let / be a C 2 strictly convex function on W 1 such that e~f is integrable. By strictly convex, 
we mean that the Hessian matrix, Hess/, of / is everywhere positive. 
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2 Work partially supported by U.S. National Science Foundation grant PHY 0965859. 
© 2011 by the authors. This paper may be reproduced, in its entirety, for non-commercial purposes. 
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Adding a constant to /, we may suppose that 

e -/(*) d n x = 1 . 



Let d/i denote the probability measure 

d/i := e~ f{x) d n x , (1.1) 

and let || • || p denote the corresponding L p (/i)-norm. 

For any two real- valued functions f,g& L 2 (n), the covariance of / and g is the quantity 

cov(g,h):= / ghd/i-l / gdfi) ( / fod/i) , (1.2) 

JR n \JR n J \JK. n J 

and the variance of h is vax(h) = cov(h, /i). 

The Brascamp-Lieb (BL) inequality [I] for the variance of h is 

var(/i) < / (Vh, Hess/ 1 V/i) d/i , (1.3) 

where (x,y) denotes the inner product in R™. (We shall also use x ■ y to denote this same 
inner product in simpler expressions where it is more convenient.) 

Since (cov(g, h)) 2 < var(g)var(/i), an immediate consequence of (jl.3p is 

{cov{g,h)) 2 < I {Vg,RessJ 1 Vg)d f i I (V/i, Ressj 1 Vh) dfi . (1.4) 

The one-dimensional variant of (11.41) . due to Otto and Menz [13], is 

|oov(flr,/i)| < I|V fl r||i||Hess7 1 V/i|| 00 =sup(^ll / W(x)\d^x) (1.5) 

for functions g and li on R 1 . They call this an asymmetric Brascamp-Lieb inequality. Note 
that it is asymmetric in two respects: One respect is to take an L l norm of Vg and an L°° 
norm of Vh, instead of L 2 and L 2 . The second respect is that the L°° norm is weighted 
with the inverse Hessian - which here is simply a number - while the L 1 norm is not 
weighted. 

Our first result is the following theorem, which generalizes both fll.4p and (jl.5p . 

1.1 THEOREM (Assymetric BL inequality). Let dfi(x) be as in (11. ip and let A m i n (x) 
denote the least eigenvalue o/Hess/(x). For any locally Lipschitz functions g and h on W 1 
that are square integrable with respect to d/i, and for 2 < p < oo, 1/p + 1/q — 1, 

\cov(g,h)\ < ||Hess; 1/p Vs|| 9 \\\%r p)/p ttess] 1/p Vh\\ p . (1.6) 

This is sharp in the sense that U.6}) cannot hold, generally, with a constant smaller than 
1 on the right side. 
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For p = 2, (11. 6p is (jl.4p . Note that (I1.6P implies in particular that for Lipschitz functions 

on R n , 

\cov(g,h)\<\\\^l P Vg\\ \\x£ q Vh\\ . 



For p = oo and g = 1, the latter is 

Icov^^i^llV^IIJlA-^H^ (1.7) 

which for n = 1 reproduces exactly (|1.5p . 

We also prove the following theorem. In addition to its intrinsic interest, it gives rise to 
an alternative proof, which we give later, of Theorem 11.11 in the case p = oo (though this 
proof only yields the sharp constant for K 1 , which is the original Otto-Menz case (II. 5p ). 

1.2 THEOREM (Divided differences and gradients). Let fi be a probability measure with 
log-concave density (11.11) . For any locally Lipschitz function h on ~R n , 

I f l fe ^)~yi d/x ( g ) d// ( y ) < 2 " f \Vh(x)\dfx. (1.8) 
Jm™ Jm™ \ x V\ Jm. n 



1.3 Remark. The constant 2 n is not optimal, as indicated by the examples in Section H] 
(we will actually briefly mention how to reach the constant 2 n / 2 ). We do not know whether 
the correct constant grows with n (and then how), or is bounded uniformly in n. We do 
know that for n = 1, the constant is at least 2 In 2. We will return to this later. 

The rest of the paper is organized as follows: Section 2 contains the proof of Theo- 
rem II. 1[ and Section 3 contains the proof of Theorem II. 2| as well as an explanation of 
the connection between the two theorems. Section 4 contains comments and examples 
concerning the constant and optimizers in Theorem 11.21 Section 5 contains a discussion 
of an application that motivated Otto and Menz, and finally, Section 6 is an appendix 
providing some additional details on the original proof of the Brascamp-Lieb inequalities, 
which proceeds by induction on the dimension, and has an interesting connection with the 
application discussed in Section 5. 

We end this introduction by expressing our gratitude to D. Bakry and M. Ledoux 
for fruitful exchanges on the preliminary version of our work. We originally proved (1 1.7ft 
with the constant n1 n using Theorem ll.2| as explained in Section 3. Bakry and Ledoux 
pointed out to us that using a stochastic representation of the gradient along the semi- 
group associated to fi (sometimes referred to as the Bismut formula), one could derive 
inequality (I1.7P with the right constant 1. This provided evidence that something more 
algebraic was at stake. It was confirmed by our general statement Theorem 11.11 and by its 
proof below. 
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2 Bounds on Covariance 

The starting point of the proof we now give for Theorem II .H is a classical dual representation 
for the covariance which, in the somewhat parallel setting of plurisubharmonic potentials, 
goes back to the work of Hormander. We shall then adapt to our L p setting Hormander's 
L 2 approach [8] to spectral estimates. 

Let g and h be smooth and compactly supported on R n . Define the operator L by 

L = A-V/-V, (2.1) 

and note that 

g(x)Lh(x) dfi(x) = - Vg(x) ■ Vh(x) dfi(x) , (2.2) 

so that L is self-adjoint on L 2 (fi). Let us (temporarily) add e\x\ 2 to / to make it uniformly 
convex, so that the Hessian of / is invertible and so that the operator L has a spectral 
gap. (Actually, L always has a spectral gap since \i is a log-concave probability measure, 
as noted in [9l [TJ. Our simple regularization makes our proof independent of these deep 
results.) 

Then provided 

h(x) dfji(x) = , (2.3) 

POO 

u \= — / e tL h(x) dt (2.4) 
Jo 

exists and is in the domain of L, and satisfies Lu = h. 

Thus, assuming (12.31) . and by standard approximation arguments, 



cov(g,h) = / g(x)h(x) d/j(x) = / g(x)Lu(x) dfi(x) 

= - Vg(x) ■ Vu(x) d/j(x) . (2.5) 

This representation for the covariance is the starting point of the proof we now give for 
Theorem 11.11 

Proof of Theorem ll.lt Fix 2 < p < oo, and let q = p/(p — 1), as in the statement of the 
theorem. Suppose h satisfies (12. 3p . and define u by (12.41) so that Lu = h. Then from (I2.5p . 



\cov(g,h)\ < 



Vg{x) ■ Vm(s) dp,(x 

< / \Hess^ 1 ^ p Vg(x) ■ B.essy p Vu(x)\dfi(x 
7 1/p V(?(x)|| g ||Hess) 



< ||Hess7 1/p V^(x)|| g ||Hess 1 / , 'V«(x)||p . (2.6) 



Thus, to prove (11. 6p for 2 < p < oo, it suffices to prove the following W I ' p -W 1,p type 
estimate: 

||Hess} /p Vu(a;)|| p < HA^^HessJ^V/illp . (2.7) 
Toward this end, we compute 



L(\Vu\ p ) = p\Vu\ p - 2 {LVu)-Vu 



+p|Vw| p - 2 Tr(Hess^) + p{p - 2)|Vw| p - 4 |Hess„V 



|2 



> p\Vu\ p - 2 (LVu) ■ Vu , (2.8) 

where we have used the fact that p > 2, and where the notation L(Vu) refers to the 
coordinate-wise action (Ld\u, . . . , Ld n u) of L. 

Then, using the commutation formula (see the remark below) 

L{Vu) = V{Lu) + Hess/Vu , (2.9) 

we obtain 

0=/ L(|Vu| p )d/i(x) >p / \Vu\ p - 2 Vu- Vhdfi(x)+p / | Vu\ p ' 2 Vu ■ Hess/Vw dfi(x) , 
and hence 

/ \Vu\ p - 2 \Ress) /2 Vu\ 2 dfx(x) < [ | Vw| p ~ 2 |Hess} /p Vw| |Hes S ; 1/p V/i| d/i(x) . (2.10) 



r, 



We now observe that for any positive n x n matrix and any vector v G 

\A l ' p v\ p < \v\ p ~ 2 \A l ' 2 v\ 2 . 

To see this, note that we may suppose |u| = 1. Then in the spectral representation of A, 
by Jensen's inequality, 



V2 / n \ VP 



Using this on the left side of (12.101) . and using the obvious estimate 



|Vu| < A~V p |Hess} /p Vw| 



on the right, we have 

||Hess} /p Vu|| p < f |Hess} /p Vu| p - 1 |A^r p)/p Hess7 1/p V/i|d/x(a;) . (2.11) 



Then by Holder's inequality we obtain (12.71) . 
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It is now obvious that we can take the limit in which e tends to zero, so that we 
obtain the inequality without any additional hypotheses on /. Our calculations so far have 
required 2 < p < oo, however, having obtained the inequality for such p, by taking the 
limit in which p goes to infinity, we obtain the p = oo, q = 1 case of the theorem. 

Finally, considering the case in which 

dfi(x) = (2rt)- n/2 e-^ 2 dx , 
and g — h = x±, we have that Hess/ = Id and so 

A min = \Eess] 1/p Vg\ = |Hess7 1/p V/i| = 1 
for all x, and so the constant is sharp, as claimed. □ 

2.1 Remark. Many special cases and variants of the commutation relation (12. 9 p are well- 
known under different names. Perhaps most directly relevant here is the case in which 
f(x) = \x\ 2 /2. Then dj and its adjoint in L 2 (fi), d* = Xj — dj, satisfy the canonical 
commutation relations, and the operator L = — Y^j=\ ®j ®j * s (minus) the Harmonic oscil- 
lator Hamiltonian in the ground state representation. This special case of (12.91) . in which 
the Hessian on the right is the identity, is the basis of the standard determination of the 
spectrum of the quantum harmonic oscillator using "raising and lowering operators" . 

In the setting of Riemannian manifolds, a commutation relation analogous to (12. 9p in 
which L is the Laplace-Beltrami operator and the Hessian is replaced by Ric, the Ricci 
curvature tensor, is known as the Bochner-Lichnerowicz formula. Both the Hessian version 
(12. 9p and the Bochner-Lichnerowicz version have been used a number of times to prove 
inequalities related to those we consider here, for instance in the work of Bakry and Emery 
on logarithmic Sobolev inequalities. 

We note that our proof immediately extends, word for word, to the Riemannian setting 
if we use, in place of (12. 9 p the commutation satisfied by the operator L given by ( 12. ip 
where / is a (smooth) potential on the manifold; That is, with some abuse of notation, 
L(Vu) = S/{Lu) + Hess/Vw + RicVw, or rather, more rigorously, 

L(\Vu\ p ) > p\Vu\ p ' 2 [V{Lu) ■ Vu + Hess/Vw ■ Vu + RicVw -V«]. 

Thus, an analog of Theorem 11.11 holds on a Riemannian manifold M equipped with a 
probability measure 

dyu(x) = e" /(:r) dvol(x) 

where dvol is the Riemannian element of volume and / a smooth function on M, provided 
Hess/ at each point x is replaced in the statement by the symmetric operator 

H x = Hess/(x) + Ric x 

defined on the tangent space. Of course, the convexity condition on / is accordingly 
replaced by the assumption that H x > at every point x G M. 
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3 Bounds on Differences 

Proof of Theorem II. 2t Since h(x) — h(y) = Wh(x t ) ■ (x — y) dt, we have 

\h(x) — h(y)\ < \x — y\ / \Vh(x t )\dt where x t := tx + (1 — t)y . (3.1) 



.7 

Next, by the convexity of /, 

e -m e -f(v) = e -(l-t)f(x) e -tf{y) e -tf(x) e -(l-t)f(y) < e -f(x t ) e -(l-t)f(x) e -tf(y) _ /g 2 \ 

Introduce the variables 

w = tx + (1 — t)y 

z = x — y . (3.3) 

A simple computation of the Jacobian shows that this change of variables is a measure 
preserving transformation for all < t < 1, and hence 



\x - y\ 



n ./Ton 



| V/i(w;) | e-Ci-O/C^+Ci-*)^) e -*/(^-^) dz d/i(w) ) dt _ (3 . 4) 
We estimate the right side of (13. 4p . By Holder's inequality, 

-(l-t)f(w+(l-t)z) e -tf{w-tz) < 

e -f(wHi-t)z) dz \ I I e -f(™-tz) dz \ . (3.5) 



But 

J e -f(w+(l-t)z) iz= (l_ f yn I e -f(w-tz) dz = f -n ^ 

and finally, (1 - £)-^(i-*)r n * = e -n(tiogt+(^-t)io g (i-t)) < 2 n_ n 

A corollary of Theorem 11.21 is a proof of Theorem 11.11 for the special case of q — 1 and 
p = oo. This proof is not only restricted to this case, it also has the defect that the constant 
is not sharp, except in one-dimension. We give it, nevertheless, because it establishes a 
link between the two theorems. 

Alternative Proof of Theorem 11.11 for q = 1: We shall use the identity 

cov(#, h) = - I I [g{x) - g(y)][h(x) - h{y)\ dfi(x) d/i(y) , (3.6) 



n .IBn 



s 



and estimate the differences on the right in different ways. 

Fix any x 7^ y in R n , and define the vector v := x — y, and for < t < 1, define 
= y + tf = tx + (1 — t)y. Then for any Lipschitz function h, 

h(x) - %) = f v ■ Vh(x t ) dt . (3.7) 
Jo 

Now note that 

A w . v/(a; t ) = {v, Ress f (x t )v) > \x - y\ 2 \ min (xt) > . (3.8) 
Integrating this in t from to 1, we obtain 



(x-y,Vf(x)-Vf{y))= [ (v, ttess f (x t )v) dt > , 

Jo 

which expresses the well-known monotonicity of gradients of convex functions. 
Next, multiplying and dividing by (v,HesSf(x t )v) in (13. 7p . we obtain 



\h(x)-h(y)\ 



Define 



and use (I3~TU|) in (ETE|) : 



1 

(v , Hess/(xt)v)(v, Hess/(x 4 )f )~ 1 t' • Vh(x t ) dt 

< / (w, Hess/(x()t') J (f , Hessf(x t )v )~ 1 v ■ Vh(x t ) \ dt 
Jo 

< / (v, Hess f(x t )v) |(A min (x t )) _1 \x - y|~ 2 t> • Vh(x t )\ dt 
Jo 

< sup j^SA \x-y\- 1 [ (v,Hess f (x t )v)dt 

= BupjP^lllar-yr^x-y.V/CrrJ-V/Cy)) . 

nv/i(z) 



C := sup 



\ A m i n (z) 



|cov(#,/i)| < -/ / |.</(.r)- </0/)||/'('.r) - /'('/)! <l//(.r) d/>(//) 




1 ./IRn 



(3.9) 



(3.10) 



< 1/ / |y(*)-y(y)lr^— r(*-y)-[V/(x)-V/(y)] d^(z)d/i(y) 
^ Jr™ Jr™ p ~~ £/| 

^ Jr" Jr" p y\ 

= -C I [ \ g (x)-g(y)\-^—(x-y)-V x e-We-fWd n xd n y , 



n ./Tffn 
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where, in the last line, we have used symmetry in x and y. 

Now integrate by parts in x. Suppose first that n > 1. Then 





n — 








\z\ 



and |V x |g(x) — g(y)\ \ = \V x g(x)\ almost everywhere. Hence we obtain 
\cov(g,h)\<c( [ \Vg(x)\dii(x) + (n-l) [ [ '^^ d^d^) ) . (3.U) 



n ./ion 



For n — 1, div = 2<5o(z) an d (13.111) is still valid since |<?(x) — g(y)\5o(x — y) — 0. 

Now, for n = 1, ( 13. lip reduces directly to ( II. 5p . For n > 1, it reduces to (jl.7p upon 
application of Theorem 1.2, but with the constant n2 n instead of 1. □ 



4 Examples and Remarks on Optimizers in Theorem 

Our first examples address the question of the importance of log-concavity. 

(1.) Some restriction on fi is necessary: If a measure dfi(x) = F(x)dx on IR. has 
F(a) = for some oeK, and F has positive mass to the left and right of a, then inequality 
fll.8p cannot possibly hold with any constant. The choice of h to be the Heaviside step 
function shows that (II. 8p cannot hold with any constant for this [i. 

(2.) Unimodality is not enough: Take dfi(x) = F(x)dx, with F(x) = 1/Ae on 
(— e,e) and F(x) = 1/4(1 — e) otherwise on the interval (—1, 1) and F(x) = for \x\ > 1. 
Let g(x) = 1 for \x\ < e + 5 and g(x) = otherwise. When 5 is positive but small, 

/ \Vg\ d/i(x) = 1/2(1 - e) 

while 

r [ M*)-M\ Mx)M y ) = o(-M<)). 
JrJr \ x ~ y\ 

(3.) For n = 1, the best constant in ( 11.81) is at least 2 In 2: Take d/j(x) = F(x)dx, 
with F(x) = 1/2 on (—1, 1) and F(x) = for |x| > 1. Let g(x) = 1 for x > and g(x) = 
for x < 0. All integrals are easily computed. 

(4.) The best constant is achieved for characteristic functions: When seeking 
the best constant in (11.81) . it suffices, by a standard truncation argument, to consider 
bounded Lipschitz functions h. Then, since neither side of the inequality is affected if we 
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add a constant to h, it suffices to consider non-negative Lipschitz functions. We use the 
layer-cake representation [TT] : 



K x ) = I X{h>t}(x) dt . 
Jo 

Then 

I i \H*)-m\ Mx)Mv) < r r / M^wM! d(l(lld(l(j)dI 

jR n Jm n f y\ Jo ii" Jr" f y\ 

(4.1) 

Define C n to be the best constant for characteristic functions of sets A and log-concave 
measures \i: 

Cn := sup <^ = ' tf } 4.2 

1 J d A e f{ ) d^„-l(x) J 



where "H n -i denotes n — 1 dimensional Hausdorff measure. Apply this to (14. ip to conclude 
that 

r r \Kx)-Ky)\ Mx)My) < a r/ e-^d^^dt 

Jr" 7m« f — 2/1 Jdx{ h >t} 

= C n I \Vh(x)\dfi(x) , (4.3) 

J~R n 

where the co-area formula was used in the last line. Thus, inequality (I1.8P holds with the 
constant C n ; in short, it suffices to consider characteristic functions as trial functions. Note 
that the argument is also valid at the level of each measure fi individually, although we are 
interested here in uniform bounds. 

With characteristic functions in mind, let us consider the case that g is the characteristic 
function of a half-space in IR n . Without loss of generality let us take this to be {x : x% < 0}. 
Clearly, the left side of (11. 8p is less than the integral with \x — y^ 1 replaced by \x\ — 
Since the marginal (obtained by integrating over X2, ■ ■ ■ , x n ) of a log concave function is log 
concave, we see that our inequality reduces to the one-dimensional case. In other words, 
the constant C n in (14. 2 p would equal Ci, independent of n, if the supremum were restricted 
to half-spaces instead of to arbitrary measurable sets. 

(5.) Improved constants and geometry of log-concave measures: With addi- 
tional assumptions on the measure one can see that the constant is not only bounded in 
n, but of order l/y/n. We are grateful to F. Barthe and M. Ledoux for discussions and 
improvements in particular cases concerning the constant in Theorem 11.21 This relies on 
the Cheeger constant a(/u) _1 > associated to the log-concave probability measure d/i, 
which is defined to be the best constant in the inequality 

MA C R n (regular enough), fi{A){l - fi(A)) < a(ji) [ e~ fix) dH n -i(x) 

JdA 
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M. Ledoux suggested the following procedure. Split the function \x — y\~ l into two pieces 
according to whether \x — y\ is less than or greater than R, for some R > 0. With h being 
the characteristic function of A, the contribution to the left side of (14. 31) for \x — y\ > R 
is bounded above by 2R~ 1 a(fj.) J gA e~^ x ^ dT-L n -i(x). The contribution for \x — y\ < R is 
bounded above in the same manner as in the proof of Theorem 11.21 but this time we only 
have to integrate z over the domain \z\ < R in each of the integrals in (13.51) . Thus, our 
bound 2 n is improved by a factor, which is the d/x volume of the ball Br = {\z\ < R}, 
once we used the Br unn- Minkowski inequality for the bound 



^((l-^Bn + w)) 1 1 fx(tB R + w) 1 < fi(B R + w) <JL(B R ) := sup fi(B R + x). 

X 

The final step is to optimize the sum of the contributions of the two terms with respect to 
R. Thus, if we denote C n (/x) the best constant in the inequality ( II. 8p of Theorem 11.21 for 
a fixed measure /x, we have 

CM < inf {2 n Jl(B R ) + 2R- 1 a(fi)} < 2 n . (4.4) 

Note that if /x is symmetric (i.e. if / is even), then the Brunn- Minkowski inequality ensures 
that Jl(B r ) = fj,(B R ). 

Unlike in (jl.8p . this improved bound depends on \i but there are situation where this 
gives optimal estimates as pointed out to us by F. Barthe. As an example, consider the case 
where \x is the standard Gaussian measure on W 1 . Using the known value of the Cheeger 
constant for this fi, and linear trial functions, one finds that the constant is bounded above 
and below by a constant times rr 1 ! 2 . 

Actually, we can use (I4.4p to improve the constant from 2 n to 2 n / 2 for arbitrary measures 
using some recent results from the geometry of log-concave measures. Without loss of 
generality, we can assume, by translation of /x, that J \x\ dfi(x) = inf„ f \x+v\ dfi(x) =: M M . 
It was proved in [9j [1] that for every log-concave measure on MJ 1 , 

a(n) < cM^ 

where c > is some numerical constant (meaning a possibly large, but computable, con- 
stant, in particular independent of n and //, of course). On the other hand, it was proved 
by Guedon [7] that for every log-concave measure v on W 1 

for some numerical constant C > 0. In the case /x is not symmetric, we pick v such that 
n{B r + v) — JZ(B R ), and then we apply the previous bound to u(-) = /x(- + v) in order to 
get that Jt(B R ) < Using these two estimates in ( 14. 4p we see that 

CM < inf{C2 n s + c/s} = n2 n ' 2 

s>0 
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for some numerical constant k > 0. 

The Brascamp-Lieb inequality (11. 3p . as well as inequality (II. 8p . have connections with 
the geometry of convex bodies. It was observed in [2] that (II .31) can be deduced from the 
Prekopa-Leindler inequality (which is a functional form of the Brunn-Minkowski inequal- 
ity). But the converse is also true: the Prekopa theorem follows, by a local computation, 
from the Brascamp-Lieb inequality (see [5] where the procedure is explained in the more 
general complex setting). To sum up, the Brascamp-Lieb inequality (II. 3p can be seen as 
the local form of the Brunn-Minkowski inequality for convex bodies. 

5 Application to Conditional Expectations 

Otto and Menz were motivated to prove (11.51) for an application that involves a large 
amount of additional structure that we cannot go into here. We shall however give an 
application of Theorem 11.11 to a type of estimate that is related to one of the central 
estimates in [IB]. 

We use the notation in [I], which is adapted to working with a partitioned set of 
variables. Write a point x G M n+m as x = (y, z) with y G R m and z G W 1 . For a function 
A on ]R n+m , let (A) z (y) denote the conditional expectation of A given y, with respect to 
/i. For a function B of y alone, (B) y is the expected value of B, with respect to \i. As in 
[I], a subscript y or z on a function denotes differentiation with respect to y or z, while a 
subscript y or z on a bracket denotes integration. For instance, for a function g on M n+m , 
g y denotes the vector (§ y L ) i<n in M n , and for % < n, g ViZ denotes the vector ( d y.Q Z . ) <m in 
K m . Finally, (g yz ) denotes the n x m matrix having the previous vectors as rows. 

Let h be non-negative with (h) x = 1 so that h(x) d[i{x) is a probability measure, and 
so is (h) z (y) dis(y), where du(y) is the marginal distribution of y under dfi(x). 

A problem that frequently arises [BJ El E3H1 [12] EEBJ is to estimate the Fisher information 
of (h) z (y) dv{y) in terms of the Fisher information of h(x) dfi(x) by proving an estimate of 
the form 



Direct differentiation under the integral sign in the variable yi gives 

((h)*)iH = (h y ,)z-cov z (h,f m ) , 

where cov^ denotes the conditional covariance of h(y,z) and f Vi (y,z), integrating in z for 
each fixed y. Let u — (ui, . . . , u m ) be any unit vector in M m . Then Hence, for each y, 




(5.1) 



rn 




i=l 



i=l 



= (h y ) z ■ u - cov z (h, f y -u), 
and hence, choosing u to maximize the left hand side, 



\{{h) z ) y \ 2 < 2\(h y ) z \ 2 + 2 (cov 2 (/i, /„ • it)) 2 . 



(5.2) 
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By (USD, 

\cov z (hJ y -u)\ < (\h z \) z \\Xj B \(f y -u^HL • (5.3) 

Note that the least eigenvalue of the n x n block f zz is at least as large as the least 
eigenvalue A mm (?/, z) of the full Hessian, by the variational principle. Hence, while we are 
entitled to use the least eigenvalue of the n x n block f zz of the full (n + m) x {n + m) 
Hessian matrix f xx , and this would be important in the application in the one dimensional 
case made in [13], here, without any special structure to take advantage of, we simply use 
the least eigenvalue of the full matrix in our bound. 

Next note that 

m / n \ 

i=i \j=i j 

n, 

and that yj(/ W|Zj ) 2 is the i,i entry of fy Z f yz where f yz denotes the upper right corner 

block of the Hessian matrix. This number is no greater than the i,i entry of the square 
of the full Hessian matrix. This, in turn, is no greater than A max . Then, since u is a unit 
vector, we have 

\{f y -u) z \ < A 

max • 

Using this in (15.31) . we obtain 

\cOV z (h, f y -U)\ < (l/lzD^llAmax/Aminlloo , (5.4) 

and then from (15.21) 

\{(h) z ) y \ 2 < 2\(h y ) z \ 2 + 2||A max /A min ||^ (|/ i2 |)^ . (5.5) 
Then the Cauchy-Schwarz inequality yields 

{(\h z \) z f < (^f^ (h) z • (5.6) 

Use this in (I5.5p . divide both sides by (h) z , and integrate in y. The joint convexity in A 
and a > of A 2 /a yields (15.11) with the constant C = 2||A max /A min ||^ . 

The bound we have obtained becomes useful when A max (a;)/A m i n (x) is bounded uni- 
formly. Suppose that f(x) has the form f(x) = <£>(|x| 2 ). Then the eigenvalues of the 
Hessian of / are 2ip'(\x\ 2 ), with multiplicity m + n — 1, and 4<^"(|a;| 2 )|a;| 2 + 2ip'(\x\ 2 ), with 
multiplicity 1. Then both eigenvalues are positive, and the ratio is bounded, whenever tp' 
is positive and, for some c < 1 < C < oo, 

—c<f'(s) < s<f"(s) < Cip'(s) . 
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5.1 Remark (Other asymmetric variants of the BL inequality). Together, (15. 3p and (15. 6 p 
yield 



{cov z (h : f y -u)) I \h 



|2 



<<^> \\^(fyu) z )\\L 



(h) z ~ \ h 

A weaker inequality is 

(cov ^-" ))2 < \\KL\a\(f, • »wiil • (5.7) 

In the context of the application in [T5] , finiteness of \\(f y •m) 2 || 00 limits / to quadratic 
growth at infinity. A major contribution of [13J is to remove this limitation in applications 
of (15. ip . The success of this application of (II. 5p depended on the full weight of the inverse 
Hessian being allocated to the L°° term. 

Nonetheless, once the topic of asymmetric BL inequalities is raised, one might enquire 
whether an inequality of the type 



\cov(g, h)\ < CIlVt/IL HHess/V/iHi (5.8) 

can hold for any constant C. There is no such inequality, even in one dimension. To see 
this, suppose that for some a G WL and some e > 0, f xx > M on (a — e, a + e). Take h(x) = 1 
for x > a and h(x) = for x < a. Take g(x) = x — a. Suppose that / is even about a. 
Then cov(g,h) = f^°(x — a)e~^ x ^ dx, while ||HessJ 1 V/i||i < M _1 , and / can be chosen 
to make M arbitrarily large while keeping || Vg||oo < 1, and cov(g, h) bounded away from 
zero. 



6 Appendix 

We recall that the original proof of (II. 3p . Theorem 4.1 of [4], used dimensional induction, 
though interesting non-inductive proofs have since been provided [2]. 

The starting point for the inductive proof is that the proof for n — 1 is elementary. The 
proof of the inductive step is more involved, and we take this opportunity to provide more 
detail about the passage from eq. (4.9) of [4j to eq. (4.10) of [4J. There is an interesting 
connection with the application discussed in the previous section, which also concerns 
(h y )z — cov z (h, f y ). We continue using the notation introduced there, but now m = 1 (i.e. 
yER). 

Eq. (4.9) reads var(/i) < (B) y where 

B = var, W+ '<Y;: C0V ^ f / ' )12 . (6.D 

\Jyy)z — V&Y z J y 

Our goal is to prove 

B < ((h z , /"%)). + p = { ( h ;> f -fMt . (6.2) 

\Iyy \Jyz-i Jzz Jyz))z 
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To do this, use the inductive hypothesis; i.e., for any H on R" 1 , 

y^ z {H)<{H g J^H s ) x . 



(6.3) 



Apply this to arbitrary linear combination H = Xh + fif y to conclude the 2x2 matrix 
inequality 



vax z (h) cov z (h,f y ) 
cov z (h,f y ) vax z (f y ) 



< 



((h z ,f z }h z )) z ((h z ,f z }f yz )) z 

\\fyzj J "zz h-z))z ((fyz, fzz fyz)) z 



Take the determinant of the difference to find that 

(fh r l h)) var (h) > ^fzzfyz))z-cav z (h,f y )] 2 
((h z ,f zz h z )) z ™ z (h)> {{fyzJ - %z))z _ yaTz{fy) ■ 

Combine (16. ip and (16 .4p to obtain 

[(hy) z - cov z (h, fy)} 2 [((h z , f zz f yz ))z ~ cov z (h, f y )]' 



(6.4) 



B<{(h z J-%)) z + 



(fyy)z-VMz(fy) 



({fyzJ Z zfyz))z-y^z(fy) 



(6.5) 



Since a 2 /a is jointly convex in a and a > 0, and is homogeneous of degree one, for all 
a > (3 > and all a and b, 



<-7 + 



(a - bf 



a (3 a — f3 

That is, a 2 /a — b 2 /f3 < (a — b) 2 /(a — f3). Use this on the right side of (16.51) to obtain 
(16.21) . noting that the positivity of a — (3 = {f yy ) z ~ ((fyz, f zz fyz)) z is a consequence of the 
positivity of the Hessian of /. 
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